IRIX Base Documentation 1998 November

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Base Documentation 1998 November / IRIX 6.5.2 Base Documentation November 1998.img / usr / share / catman / u_man / cat1 / prof.z / prof

Wrap

Text File | 1998-10-30 | 41KB | 793 lines

PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) NNNNAAAAMMMMEEEE prof - analyze SpeedShop performance data SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS pppprrrrooooffff [options] [speedshop_data_file | pixie_counts_file] ... pppprrrrooooffff [options] executable-name [speedshop_data_file | pixie_counts_file] ... DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN _p_r_o_f analyzes one or more data files generated by the SpeedShop performance tools and produces a report. (Note that most reports are formatted with long lines, and should be viewed in a window that is 135 characters wide, and printed in wide format.) The second form is used to analyze data files generated by the SpeedShop performance tools if the target program is not in the same directory as the data files (in which case <executable-name> should be the path to the target program). Multiple files can be included only if they are recorded from the same executable with the same experiment type. The default listing for all experiments lists functions in descending order of the appropriate exclusive (meaning from within the function, rather than included from calls it makes) performance metric. (See below, FUNCTION LIST.) Options allow sorting by calls or by inclusive metrics, for those experiments where the recorded data supports them. Where applicable, the -_b[_u_t_t_e_r_f_l_y] option also produces a listing of callers and callees for each function, with attribution percentages and time or counts. (See below, BUTTERFLY LIST.) Addition listings may also be generated; see below. The current implementation supports the following SpeedShop experiments: uuuusssseeeerrrrttttiiiimmmmeeee (callstack profiling, user+system time trigger) causes the program to be interrupted every 30 milliseconds during its running time (user or system mode, but not including any wait time), and to record the callstack at each interrupt. It can show both inclusive and exclusive time. uuuusssseeeerrrrttttiiiimmmmeeee data is statistical in nature, and will show some variance from run to run. ttttoooottttaaaallllttttiiiimmmmeeee (callstack profiling, wall-clock time trigger) causes the program to be interrupted every 30 milliseconds of wall-clock time during the run, and to record the callstack at each interrupt. It can show both inclusive and exclusive time. ttttoooottttaaaallllttttiiiimmmmeeee data is statistical in nature, and will show some variance from run to run. [[[[ffff]]]]ppppccccssssaaaammmmpppp[[[[xxxx]]]] asks the kernel to look at the user PC every 10 milliseconds, and record a histogram of the value of the program counter at each clock tick, using 16-bit bins, one for each PC value. It can only show exclusive data, that is data about where the program counter was, not the callstack to get there. The [[[[ffff]]]] prefix specifies 1 millisecond profiling, instead of 10 milliseconds. The [[[[xxxx]]]] suffix specifies 32-bit count bins, instead of 16-bit. ****ppppccccssssaaaammmmpppp**** data is statistical in nature, and will show some variance from run to run. PPPPaaaaggggeeee 1111 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) ****____hhhhwwwwcccc asks the kernel to look at the user PC every time the hardware performance counter specified by the experiment overflows, and record a histogram of the value of the program counter at overflow. These experiments can only be run on R10000 machines; other machines do not have the hardware performance counters. There are a number of these experiments defined; see speedshop(1). They can only show exclusive data, that is data about where the program counter was, not the callstack to get there. The particular counter used and its overflow value are specified in the experiment. Some of the Hardware Performance counter sampling experiments are statistical in nature, and will show some variance from run to run; others are exact, provided that the program executes the exact same sequence of instructions. Among the interesting counter prefixes to use are: ccccyyyy____ for cycle counting; ggggiiii____ for graduated instructions; iiiicccc____ for primary instruction cache misses; ssssiiiicccc____ for secondary instruction cache misses; ddddcccc____ for primary data cache misses; sssscccc____ for secondary data cache misses; ttttllllbbbb____ for TLB misses; ggggffffpppp____ for graduated floating-point instructions; and ffffsssscccc____ for failed store-conditional instructions. ****____hhhhwwwwccccttttiiiimmmmeeee (callstack profiling, R10K hardware counter overflow trigger) causes the program to be interrupted at every _N overflows of the particular counter and to record the callstack at each interrupt. The value of _N depends on the particular counter chosen. It can take the same set of prefixes as above, but since callstacks are recorded, it can show both inclusive and exclusive time. ****____hhhhwwwwccccttttiiiimmmmeeee data is statistical in nature, and will show some variance from run to run. iiiiddddeeeeaaaallll causes the code to be instrumented to count the number of times each basic block is executed. (A _b_a_s_i_c _b_l_o_c_k is a region of the program that can be entered only at the beginning and exited only at the end.) The data recorded also contains counts for all function pointer calls. From this data, a machine model is used to compute the exclusive time (cycles) spent in each function. Inclusive time computations, performed when the -_b[_u_t_t_e_r_f_l_y] flag is specified to _p_r_o_f, calculates the exclusive time for each routine as above, and then propagates the time to each caller of each routine in proportion to the number of calls. For example, if sin(x) takes 1000 cycles, and its callers - procedures foo() and bar() - call it 25 and 75 times respectively, 250 cycles are attributed to foo() and 750 to bar(). By propagating cycles this way, __start() should end up with all the cycles counted in the program. Note that the propagation according to the number of calls may not be reasonable for some routines, and may lead to misleading reports. For example, if a matrix-multiply routine is substituted for sin(x) in the above example, and bar's calls are for 2X2 matrices, while foo's calls PPPPaaaaggggeeee 2222 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) are for 100X100 matrices, the attribution distributes 3/4 of the time to bar, whereas nearly all the time really should be attributed to foo. ffffppppeeee asks the floating-point exception library to trace all floating-point exceptions, with their callstacks. These experiments show listings similar to those for uuuusssseeeerrrrttttiiiimmmmeeee experiments, except the data reported is a count of floating-point exceptions, rather than projected CPU time. iiiioooo asks for a trace of all IO calls made by the program. These experiments will show calls attributed to functions, and with -_b[_u_t_t_e_r_f_l_y], can show which functions made which calls. OOOOUUUUTTTTPPPPUUUUTTTT RRRREEEEPPPPOOOORRRRTTTTSSSS _p_r_o_f writes an analysis of the performance data to stdout. The first thing in the list is a summary of the experiment, and description of the environment in which it was recorded. That is followed by a header that summarized the particular data recorded. Following that is the function list; if -_b[_u_t_t_e_r_f_l_y] is given, and the data supports it, a butterfly list will be presented. If -_h[_e_a_v_y] or -_l[_i_n_e_s] is given, for ****ppppccccssssaaaammmmpppp****, ****hhhhwwwwcccc, and iiiiddddeeeeaaaallll experiments, a report of data at the source line level is appended, sorted by the performance metric computed on a line basis, or by functions, and then by line numbers within a function, respectively. For other experiments, these options are ignored. If -_b_a_s_i_c_b_l_o_c_k_s is given, for iiiiddddeeeeaaaallll experiments only, A report of data at the basic-block level is appended. If -_a_r_c_h_i_n_f_o is given, also for iiiiddddeeeeaaaallll experiments only, a summary report of register usage, instruction usage, and various other statistics is appended. For other experiments, these options are ignored. If -_d_s_o_l_i_s_t is given, a list of the DSOs used by the program is appended. If -_u_s_a_g_e is given, a summary of the resources used by the program is appended. FFFFUUUUNNNNCCCCTTTTIIIIOOOONNNN LLLLIIIISSSSTTTT The default output for any experiment is a function list, sorted in the order of exclusive values of the primary metric--the performance cost of the function, as computed from the data recorded. For many experiments, this metric is a time, printed in the report in seconds; for others, it is a count of events-FPEs, IO calls, or R10K counter overflow counts. The report begins with a legend line, naming each of the columns of data. Each line in the list has an index; if -_b[_u_t_t_e_r_f_l_y] is not specified, the index will be in numerical order; if it is specified, the index will be in order of functions as sorted by the inclusive metric. The index serves as a poor-person's hyperlink through the butterfly, and between the butterfly and the function list. PPPPaaaaggggeeee 3333 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) To the left of the index, is the primary metric, followed by its representation as a percentage of the whole program, followed by the cumulative percentage up to that point in the list. The next two columns are inclusive value of the metric., and its representation as a percentage. Following that may be additional columns of data, as appropriate to the particular data recorded. Finally, rightmost on the line will be the name of the function, and the DSO and source file basenames. BBBBUUUUTTTTTTTTEEEERRRRFFFFLLLLYYYY LLLLIIIISSSSTTTT The butterfly list is a set of records that show the callers and callees of each function. The list is sorted in the order of inclusive values of the primary metric. For each function, its callers are shown above it, and its callees below. The center line is for the function itself, and shows the index at the beginning and end of the line. The second and third columns are the inclusive percentages of the primary metric and its absolute value. The next two columns of the center line are the exclusive percentage and the exclusive value of the metric, followed by the function identification. Callees are shown below the function, with attribution percentages and values lined up below the self percentages and values of the center function. In absolute mode (the default), the percentages, including the center node's self value, should add up to the inclusive percentage of the center node; in relative mode, specified with -_r_e_l[_a_t_i_v_e] flag, the percentages should add up to 100 %. In either mode, the attribution values should add up to the inclusive time of the central function. The attribution data for callees is followed by the inclusive value for that caller and callsite, followed by the callsite identification, with an address, a source file, and a line number. For callees, the source file will be that of the central function. The callers are shown above the central function, also with attribution percentages and values, but these have a different meaning, and are aligned with the central nodes inclusive values to so indicate. The attribution percentage and value in a caller's line represents the percentage and value, respectively, of the central function's metric that was attributed to that callsite. In absolute mode, the percentages should add up to the central function's percentage; in relative mode, they should add up to 100%. The attribution values should add to the central function's values. The attribution data for callers is followed by the inclusive value for that caller and callsite, followed by the callsite identification, with an address, a source file, and a line number. For callers, the source file will be that of the calling function. LLLLIIIINNNNEEEE----LLLLEEEEVVVVEEEELLLL LLLLIIIISSSSTTTTSSSS When invoked with the -_h[_e_a_v_y] or -_l[_i_n_e_s] arguments, a list of line- level data is produced. When -_h[_e_a_v_y] is used, the line list is sorted by the primary metric associated with each line, with lines from the various DSOs and source files intermixed. When -_l[_i_n_e_s] is used, the PPPPaaaaggggeeee 4444 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) lines are sorted based on the function they came from, with all lines with non-zero values from each function printed before any lines from the next most important function. For each line, the leftmost column contains the value for the primary metric computed from the particular experiment, followed by its representation as a percentage, followed by the cumulative percentage over all lines printed thus far. These data are followed by any additional metrics for the particular experiment, and then by the name of the function, its DSO, file and line number. BBBBAAAASSSSIIIICCCC BBBBLLLLOOOOCCCCKKKK LLLLIIIISSSSTTTT When invoked with the -_b_a_s_i_c_b_l_o_c_k_s argument (applicable iiiiddddeeeeaaaallll experiments only), a list of all the basic blocks in the program that were executed is generated. It is preceded by a header, and a column header line, and then the list of basic blocks, in order of total cycles. Each basic block is printed with its index, the number of cycles per execution of the block, the count of executions, the total cycles, and the total cycles represented as a percentage. Trailing that on each line, is the function, and the address of the beginning of the line, with the DSO, source file, and source line number. AAAARRRRCCCCHHHHIIIITTTTEEEECCCCTTTTUUUURRRRAAAALLLL IIIINNNNFFFFOOOORRRRMMMMAAAATTTTIIIIOOOONNNN RRRREEEEPPPPOOOORRRRTTTT When invoked with the -_a_r_c_h_i_n_f_o argument (applicable iiiiddddeeeeaaaallll experiments only), a list of various metrics concerning execution of the program is printed. The report consists of a header, and a number of subsections, each with the appropriate headers. For integer registers, counts are printed of the number of times each register was used, and its percentage, the number of times each register was used as a base register, and its percentage, and the number of times each register is used as a destination register, and its percentage. For floating point registers, the same data, less the base count statistics, is printed. Following the register usage statistics, a number of counts of instruction types or sequences are printed, each with a description, followed by a list of all the different instructions used, sorted by the number of times each was executed. For each instruction, the dynamic count of executions, and its representation as a percentage is printed. That is followed by the cumulative execution percentage, and the count of the number of distinct instructions of that type that were executed one or more times, and that count represented as a percentage. DDDDSSSSOOOO LLLLIIIISSSSTTTT When invoked with the -_d_s_o_l_i_s_t argument, a report of summary information about the DSOs in the execution of the program is printed. For each DSO, it gives the name, a count of instructions, functions, source-files, and source-lines, the high and low addresses, and the full pathname to the DSO. If the DSO is ignored in computation, either because it belongs to the SpeedShop runtime, and the -_s_h_o_w_s_s flag is not provided, or because PPPPaaaaggggeeee 5555 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) the DSO was excluded by being listed as with a -_x_d_s_o argument, or the DSO was excluded by virtue of its not being listed with a -_d_s_o argument, the DSO is flagged as ignored. RRRREEEESSSSOOOOUUUURRRRCCCCEEEE UUUUSSSSAAAAGGGGEEEE When invoked with the -_u_s_a_g_e argument, a report of summary usage data, as measured by the kernel during the run, is produced. The data consists of both per-process (per-file) and per-system metrics. Included are real- time (wall clock time), user, system and wait times. The per-process data is shown as a sum of the data from the experiment files; the real- time, and the system-wide statistics are printed as maxima over the data in each file. Included are the accounting timers, giving the time spent in each of the various process states. The sum of the accounting timers should be approximately equal to the elapsed real-time, since the process must always be in one of the states. There is some skew in the reading of the data, so some discrepancy should be expected. Other summary statistics include bytes read and written, page faults, context switches, system calls, and process-size statistics. DDDDIIIISSSSAAAASSSSSSSSEEEEMMMMBBBBLLLLYYYY AAAANNNNDDDD SSSSOOOOUUUURRRRCCCCEEEE LLLLIIIISSSSTTTTSSSS When invoked with the -_d_i_s argument, an assembly listing of the program is generated. If the -_d_i_s argument is given as well, the source code for the disassmebly is intermixed with it. OOOOPPPPTTTTIIIIOOOONNNNSSSS ----ccccaaaalllliiiippppeeeerrrrssss [[[[nnnn1111,,,,]]]]nnnn2222 flag causes _p_r_o_f to compute the data between caliper points n1 and n2, rather than for the entire experiment. If n1 >= n2, an error is reported, otherwise if n1 is negative it is set to the beginning of the experiment, and if n2 is greater than the maximum recorded, it is set to the maximum. If n1 is omitted, zero is assumed. ----bbbb[[[[uuuutttttttteeeerrrrffffllllyyyy]]]] or ----ggggpppprrrrooooffff causes _p_r_o_f to print a report showing the callers and callees of each function, with inclusive time attributed to each. For iiiiddddeeeeaaaallll experiments, the attribution is based on a heuristic, while for the various callstack sampling/tracing experiments the attribution is precise (although uuuusssseeeerrrrttttiiiimmmmeeee and ttttoooottttaaaallllttttiiiimmmmeeee, as well as some ****____hhhhwwwwccccttttiiiimmmmeeee experiments are statistical in nature). This flag is ignored for experiments where the data does not support inclusive calculations. ----pppp[[[[tttthhhhrrrreeeeaaaaddddssss]]]] <<<<pppptttthhhhrrrreeeeaaaadddd____iiiidddd1111>>>>,,,,<<<<pppptttthhhhrrrreeeeaaaadddd____iiiidddd2222>>>>,,,,............,,,,<<<<pppptttthhhhrrrreeeeaaaadddd____iiiiddddnnnn>>>> For uuuusssseeeerrrrttttiiiimmmmeeee, ttttoooottttaaaallllttttiiiimmmmeeee, ****____hhhhwwwwccccttttiiiimmmmeeee, iiiioooo, and ffffppppeeee experiments on applications that use pthreads (on Irix 6.5 or later), analyze data only for the specified pthreads. This flag is ignored for other experiments. ----uuuu[[[[ssssaaaaggggeeee]]]] Print a report on system statistics and timers. PPPPaaaaggggeeee 6666 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) ----ddddiiiissss[[[[aaaasssssssseeeemmmmbbbblllleeee]]]] Disassemble and annotate the analyzed object code with cycle times or number of PC samples. This option can be used when generating reports for iiiiddddeeeeaaaallll, ppppccccssssaaaammmmpppp, or pppprrrrooooffff____hhhhwwwwcccc experiments. ----SSSS((((----ssssoooouuuurrrrcccceeee)))) Disassemble and annotate the analyzed object code with cycle times (or PC samples) and interleave the source code. This option can be used when generating reports for iiiiddddeeeeaaaallll, ppppccccssssaaaammmmpppp, or ****____hhhhwwwwcccc experiments. ----hhhh[[[[eeeeaaaavvvvyyyy]]]] Reports the most heavily used lines in descending order of use. This option can be used when generating reports for iiiiddddeeeeaaaallll, ppppccccssssaaaammmmpppp, or pppprrrrooooffff____hhhhwwwwcccc experiments. It is ignored for other experiments. ----llll[[[[iiiinnnneeeessss]]]] Like ----hhhh[[[[eeeeaaaavvvvyyyy]]]], but group lines by procedure, with procedures sorted in descending order of use. Within a procedure, lines are listed in source order. This option can be used when generating reports for iiiiddddeeeeaaaallll, ppppccccssssaaaammmmpppp, or pppprrrrooooffff____hhhhwwwwcccc experiments. It is ignored for other experiments. ----[[[[nnnnoooo]]]]ccccoooorrrrddddffffbbbb Disables or enables cord feedback file generation for the executable only. Cord feedback is used to arrange procedures in the binary in an optimal ordering, to improve both paging and instruction-cache performance. Users can use _c_o_r_d(_1) (or, in the near-term future, _l_d(_1)) to actually do the procedure-ordering. ----ccccoooorrrrddddffffbbbbaaaallllllll Enables cord feedback for the executable and all DSOs. ----ffffeeeeeeeeddddbbbbaaaacccckkkk Produces files with information that can be used to (a) tell the compiler how to optimize compilation of the program next time and (b) arrange procedures in the binary in an optimal ordering. Users can invoke "_c_c(_1) -fb <cfb-filename>" to use the compilation- optimization feedback file for subsequent compilations. To disable cord feedback, while producing compiler feedback, use the options -_f_e_e_d_b_a_c_k -_n_o_c_o_r_d_f_b together. Procedures are normally ordered by their measured invocation counts; if -gprof is also specified, procedures are ordered using call graph counts (that capture caller-callee relationships as well), rather than invocation counts. The cord feedback file is named <a.out>.fb or <lib*so*>.fb. The -_f_e_e_d_b_a_c_k option produces cord feedback for the executable only. To get cord feedback for all the DSOs as well, use the options -_f_e_e_d_b_a_c_k -_c_o_r_d_f_b_a_l_l together. This option also produces a file with information that the compiler system can use to recompile, optimizing by using measured branch frequencies, etc. The feedback file is produced for the executable only. It is named <a.out>.cfb and is a binary file. It may be PPPPaaaaggggeeee 7777 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) dumped using the fbdump(1) routine. This option can only be used with pixie (ideal-time) data files, and the data should be recorded on a binary that was compiled -O0. Recording data on binaries with higher optimization will generate a feedback file that does not have the appropriate correspondence between source lines and machine code. ----wwwwssss Generate a working-set file for the current caliper setting, for the executable only. ----wwwwssssaaaallllllll Generate a working-set file for the current caliper setting, for the executable and all the non-ignored DSOs. ----sssshhhhoooowwwwssssssss Enables display of functions from the SpeedShop runtime. Normally such functions are suppressed from the reports and computations. In addition, some statistics for pppprrrrooooffff's own memory usage will be printed. ----ddddssssoooolllliiiisssstttt List all the DSO's in the program and their start and end text addresses. OOOOUUUUTTTTPPPPUUUUTTTT CCCCOOOONNNNTTTTRRRROOOOLLLLSSSS ----rrrreeeellll[[[[aaaattttiiiivvvveeee]]]] Show percentage attribution in a butterfly report relative to the central function. The default is to show percentages as absolute percentages over the whole run. ----iiiinnnncccclllluuuussssiiiivvvveeee Sort function list by inclusive data, rather than by exclusive data. This option can only be used when generating reports for those experiments which have inclusive data; it is ignored for others. ----ccccaaaallllllllssss Sort function list by procedure calls rather than by time. This option can only be used when generating reports for iiiiddddeeeeaaaallll experiments, or for basic block counting data obtained with _p_i_x_i_e. ----qqqq[[[[uuuuiiiitttt]]]] nnnn Truncates the -_h[_e_a_v_y], -_l[_i_n_e_s] and -_b[_u_t_t_e_r_f_l_y] listings after the first _n procedures or lines have been listed. ----qqqq[[[[uuuuiiiitttt]]]] nnnn%%%% Truncates the -_h[_e_a_v_y], -_l[_i_n_e_s] and -_b[_u_t_t_e_r_f_l_y] listings after those procedures or lines up to the one which takes more than _n percent of the total. ----qqqq[[[[uuuuiiiitttt]]]] nnnnccccuuuummmm%%%% Truncates the -_h[_e_a_v_y], -_l[_i_n_e_s] and -_b[_u_t_t_e_r_f_l_y] listings after those procedures or lines up to the one which brings the cumulative PPPPaaaaggggeeee 8888 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) total to _n percent. (For -_b[_u_t_t_e_r_f_l_y], it behaves the same as -_q_u_i_t _n%.) For example, "-_q[_u_i_t] _1_5" truncates each part of the listing after 15 lines of text, "-_q[_u_i_t] _1_5%" truncates each part after the first line that represents less than 15 percent of the whole, and "-_q[_u_i_t] _1_5_c_u_m%" truncates each part after the line that brought the cumulative percentage above 15 percent. ----ddddiiiisssslllliiiimmmmiiiitttt nnnn Disassemble only those basic blocks with frequency >= n. ----nnnnhhhh Suppress various header blocks from the output. SELECTIVITY OPTIONS ----ddddssssoooo ddddssssoooo____nnnnaaaammmmeeee Report only on the named DSO. Only the basename of the DSO need be specified not the full pathname to the DSO; the ._s_o suffix is required. Multiple instances of the -_d_s_o flag can be given, and the executable is considered a DSO, like any other. All the DSOs from an experiment can be listed with the ----ddddssssoooolllliiiisssstttt flag. ----xxxxddddssssoooo ddddssssoooo____nnnnaaaammmmeeee Exclude the named DSO from any reports. Only the basename of the DSO need be specified not the full pathname to the DSO; the ._s_o suffix is required. Multiple instances of the ----xxxxddddssssoooo flag can be given. ----oooo[[[[nnnnllllyyyy]]]] pppprrrroooocccceeeedddduuuurrrreeee____nnnnaaaammmmeeee If you use one or more ----oooo[[[[nnnnllllyyyy]]]] options, the profile listing includes only the named procedures, rather than the entire program. If any option uses an uppercase "O" for ----OOOO[[[[nnnnllllyyyy]]]],,,, _p_r_o_f uses only the named procedures, rather than the entire program, as the base upon which it calculates percentages. ----eeee[[[[xxxxcccclllluuuuddddeeee]]]] pppprrrroooocccceeeedddduuuurrrreeee____nnnnaaaammmmeeee If you use one or more ----eeee[[[[xxxxcccclllluuuuddddeeee]]]] options, the profiler omits the specified procedure from the listing. If any option uses an uppercase "E" for ----EEEE[[[[xxxxcccclllluuuuddddeeee]]]] , prof also omits that procedure from the base upon which it calculates percentages. CPU OPTIONS _p_r_o_f normally uses the scheduling model for the processor on which it is being run to perform the analysis. The user can override the default with any of the following options: {{{{----rrrr11110000000000000000 |||| ----rrrr8888000000000000 |||| ----rrrr5555000000000000 |||| ----rrrr4444000000000000 |||| ----rrrr3333000000000000}}}} Note that these options are only meaningful for ideal time and pixie-counts data. PPPPaaaaggggeeee 9999 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) ----cccclllloooocccckkkk _m_e_g_a_h_e_r_t_z Set the CPU clock speed to _m_e_g_a_h_e_r_t_z MHz. Alters the appropriate parts of the listing to reflect the clock speed. The default value is the clock speed of the machine on which the experiment was performed. ----ccccyyyycccclllleeee _n_a_n_o_s_e_c_o_n_d Set cycle time to _n_a_n_o_s_e_c_o_n_d ns. This is the same as ----cccclllloooocccckkkk 1000/_n_a_n_o_s_e_c_o_n_d. DDDDEEEEBBBBUUUUGGGGGGGGIIIINNNNGGGG OOOOPPPPTTTTIIIIOOOONNNNSSSS ----ddddeeeebbbbuuuugggg::::ddddeeeebbbbuuuugggg____ffffllllaaaaggggssss Where debug_flags can be combinations of the following. GPROF_FLAG 0x00000001 COUNTS_FLAG 0x00000002 SAMPLE_FLAG 0x00000004 MISS_FLAG 0x00000008 FEEDBACK_FLAG 0x00000010 CORD_FLAG 0x00000020 USERPC_FLAG 0x00000040 MDEBUG_FLAG 0x00000080 BEAD_FLAG 0x00000100 DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS _p_r_o_f prints warnings and fatal errors on stderr. With inclusive cycle counting, _p_r_o_f prints a list of functions at the end which are called but not defined. Any functions starting with __r_l_d listed here is normal behavior. They appear because _r_l_d is not instrumented. One way to sanity-check inclusive cycle counts is to look at the percentage cycles for __start(). If the value is anything less than 98- 99%, the inclusive report is suspect. Look for other warnings which indicate that _p_r_o_f didn't take into account certain procedures. There are a number of known cases when _p_r_o_f fails to list cycles of a procedure in the inclusive listing. The reasons can be one of the following: - init & fini sections, mips stubs are not part of any procedure. - calls to procedures which don't use a jump and link are not recognized as procedure calls. PPPPaaaaggggeeee 11110000 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) - executions of global procedures with same name in different DSOs. In this case, only one of them is listed. All these exceptions are listed at the end of the -_g_p_r_o_f listing, under a separate section. LLLLIIIIMMMMIIIITTTTAAAATTTTIIIIOOOONNNNSSSS With both pc-sampling and basic block counting, _p_r_o_f may report times for procedures named with a prefix of *_D_F*, for example *_D_F*__h_e_l_l_o._i_n_i_t__2. DF stands for _D_u_m_m_y _F_u_n_c_t_i_o_n, and indicates cycles spent in parts of text which are not in any function, for example, init and fini sections, and MIPS.stubs sections. For pc-sampling, and hardware counter overflow pc- sampling, a dummy function, *_D_F*__O_t_h_e_r is used to report sampling hits in an overflow bin; these hits include time spent in _r_l_d, and any other text regions not listed in any DSO. If any of the object files linked into the application have been stripped of line-number information (with _l_d -_x for example), _p_r_o_f warns about the affected procedures. The instruction counts for such procedures are shown as a procedure total, not on a per-basic-block basis. Where a line number would normally appear in a report on a function without line numbers question marks appear instead. _p_r_o_f does not take into account interactions between basic blocks in ideal time runs. _p_r_o_f computes the cycles for one execution of a basic block, assuming all registers are free at entry to the block, and it multiplies this count by the number of times that basic block is executed. In real programs, a block may be entered with a result not yet ready in a register, or with a function unit busy, so the cycle count computed could be either higher or lower than the correct value. Extending the computations to include inter-block state would be prohibitively expensive. When run on a program using shared libraries, _p_r_o_f sometimes combines the times of real but anonymous procedures in a shared library into the preceding (according the library memory layout) externally visible function. The times reported for your procedures are not affected by this attribution-error, which is normally minor. _p_r_o_f cannot be run on programs which have been stripped. Compiler optimization level 3 does procedure inlining. This can result in extremely misleading profiles since the time spent in the inlined procedure shows up in the profile as time spent in the procedure into which it was inlined. It is generally better to use compiler optimization level 2 or less when gathering an execution profile. Fortran alternate entry point times are attributed to the main function/subroutine, since there is no general way for _p_r_o_f to separate the times for the alternate entries. PPPPaaaaggggeeee 11111111 PPPPRRRROOOOFFFF((((1111)))) PPPPRRRROOOOFFFF((((1111)))) SSSSEEEEEEEE AAAALLLLSSSSOOOO speedshop(1), ssrun(1), ssdump(1), sscord(1), ssorder(1), pixie(1), fbdump(1) PPPPaaaaggggeeee 11112222